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Abstract 

The algorithm designed in [12, 15] was the very first distributed algorithm to solve the 
mutual exclusion problem in complete networks by using a dynamic logical tree structure 
as its basic distributed data structure, viz. a path reversal transformation in rooted n-node 
trees; besides, it was also the first one to achieve a logarithmic average-case message com- 
plexity. The present paper proposes a direct and general approach to compute the moments 
of the cost of path reversal. It basically uses one-one correspondences between combinatorial 
structures and the associated probability generating functions: the expected cost of path 
reversal is thus proved to be exactly Hn-i. Moreover, time and message complexity of 
the algorithm as well as randomized bounds on its worst-case message complexity in arbi- 
trary networks are also given. The average-case analysis of path reversal and the analysis 
of this distributed algorithm for mutual exclusion are thus fully completed in the paper. 
The general techniques used should also prove available and fruitful when adapted to the 
most efficient recent tree-based distributed algorithms for mutual exclusion which require 
powerful tools, particularly for average-case analyses. 

1 Introduction 

A distributed system consists of a collection of geographically dispersed autonomous sites, which 
are connected by a communication network. The sites (or processes) have no shared memory 
and can only communicate with one another by means of messages. 

In the mutual exclusion problem, concurrent access to a shared resource, called the critical 
section (C5), must be synchronized such that at any time, only one process can access the (CS). 
Mutual exclusion is crucial for the design of distributed systems. Many problems involving 
replicated data, atomic commitment, synchronization, and others require that a resource be 
allocated to a single process at a time. Solutions to this problem often entail high communication 
costs and are vulnerable to site and communication failures. 

Several distributed algorithms exist to implement mutual exclusion [1, 3, 9, 10, 13, 14, 15], 
etc., they usually are designed for complete or general networks and the most recent ones are 
often fault tolerant. But, whatever the algorithm, it is either a permission-based, or a token- 
based algorithm, and thus, it uses appropriate data structures. Lamport's token-based algorithm 
[9] maintains a waiting queue at each site and the message complexity of the algorithm is 3(n — 1), 
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where n is the number of sites. Several algorithms were presented later, which reduce the number 
of messages to Q{n) with a smaller constant factor [3, 14]. Maekawa's permission-based algorithm 
[10] imposes a logical structure on the network and only requires c^/n messages to be exchanged 
(where c is a constant which varies between 3 and 5). 

The token-based algorithm A (see [12, 15]), which is analysed in the present paper, is the first 
mutual exclusion algorithm for complete networks which achieves a logarithmic average message 
complexity ; besides, it is the very first one to use a tree-based structure, namely a path reversal, 
as its basic distributed data structure. More recently, various mutual exclusion algorithms {e.g. 
[1, 13], etc.) have been designed which use either the same data structure, or some very close 
tree-based data structures. They usually also provide efficient (possibly fault tolerant) solutions 
to the mutual exclusion problem. 

The general model used in [12, 15] to design algorithms A assumes the underlying communica- 
tion links and the processes to be reliable. Message propagation delay is finite but impredictable 
and the messages are not assumed to obey the FIFO rule. A process entering the (CS) releases it 
within a finite delay. Moreover, the communication network is complete. To ensure a fair mutual 
exclusion, each node in the network maintains two pointers. Last and Next, at any time. Last in- 
dicates the node to which requests for ( CS) access should be forwarded ; Next points to the node 
to which access permission must be forwarded after the current node has executed its own (CS). 
As described bcilow. the dynamic updating of these two pointers involves two distributed data 
structures: a waiting queue, and a dynamic logical rooted tree structure which is nothing but a 
path reversal. Algorithm A is thus very efficient in terms of average-case message complexity, 
viz. Hn-i = Inn -I- 0(1) ^ 

Let us recall now how the two data structures at hand are actually involved in the algorithm, 
which is fully designed in [12, 15]. Algorithm A uses the notion of token. A node can enter its 
( CS) only if it has the token. However, unlike the concept of a token circulating continuously in 
the system, the token is sent from one node to another if and only if a request is made for it. 
The token (also called privilege message) consists of a queue of processes which are requesting 
the {CS). The token circulates strictly according to the order in which the requests have been 
made. 

The first data structure used in ^ is a waiting queue which is updated by each node after 
executing its own {CS). The waiting queue of requesting processes is maintained at the node 
containing the token and is transferred along with the token whenever the token is transferred. 
The requesting nodes receive the token strictly according to the order in the queue. Each node 
knows its next node in the waiting queue only if the Next exists. The head is the node which 
owns the token and the tail is the last node which requested the {CS). Thus, a path is constructed 
in such a way that each request message is transmitted to the tail. Then, either the tail is in 
the {CS) and it let the requesting node enter it, or the tail waits for the token, in which case the 
requesting node is appended to the tail. 

The second data structure involved in algorithm A gives the path to go to the; tail: it is a 
logical rooted ordered tree. A node which requests the {CS) sends its message to its Last, and, 
from Last to Last, the request is transmitted to the tail of the waiting queue. In such a structure, 
c!vc!ry node knows only its Last. Moreover, if the requesting node is not the last, the logical tree 
structure is transformed: the requesting node is the new Last and the nodes which are located 
between the requesting node and the last will gain the new last as Last. This is typically a logical 
transformation of path reversal, which is performed at a node x of an ordered n-node tree T„ 
consisting of a root with n — 1 children. These transformations <p{Tn) are performed to keep a 

^Throughout the paper, Ig denotes the base two logarithm and In the natural logarithm. H„ = X^ILi 
denotes the n-th harmonic number, with asymptotic value Hn = In n + 7 + l/2n + 0(n~^) (where 7 = 0.577 . . . 
is Euler's constant) 
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dynamic decentralized path towards the tail of the waiting queue. 

In [7], Ginat, Sleator and Tarjan derived a tight upper bound of Ign for the cost of path 
reversal in using the notion of amortized cost of a path reversal. Actually, by means of combina- 
torial and algebraic methods on the Dycklanguage (namely by encoding oriented ordered trees 
r„ with Dyckwords), the average number of messages used by algorithm A was obtained in [12]. 
By contrast, the present paper uses direct and general derivation methods involving one-to-one 
correspondences between combinatorial structures such as priority queues, binary tournament 
trees and permutations. Moreover, a full analysis of algorithm A is completed in this paper 
from the computation of the first and second moments of the cost of path reversal ; viz. we de- 
rive the expected and worst-case message complexity of A as well as its average and worst-case 
waiting time. Note that the average-case analysis of other efficient mutual exclusion tree-based 
algorithms {e.g. [1, 13], among others) may easily be adaptated from the present one, since the 
data structures involved in such algorithms are quite close to those of algorithm A. The analysis 
of the average waiting time using simple birth-and-death process methods and asymptotics, it 
could thus also apply easily to the waiting time analysis of the above-mentioned algorithms. In 
this sense, the analyses proposed in this paper are quite general indeed. 

The paper is organized as follows. In Section 2, we define the path reversal transformation 
performed in a tree r„ and give a constructive proof of the one-one correspondence between 
priority queues and the combinatorial structure of trees r„. In Section 3, probability generating 
functions arc computed which yield the exact expected cost of path reversal: Hn-i, and the 
second moment of the cost. Section 4 is devoted to the computation of the waiting time and the 
expected waiting time of algorithm A. In Section 5, more extended complexity results are given, 
viz. randomized bounds on the worst-case message complexity of the algorithm in arbitrary 
networks. In the Appendix, we propose a second proof technique which directly yields the exact 
expected cost of path reversal by solving a straight and simple recurrent equation. 

2 One-one correspondences between combinatorial struc- 
tures 

We first define a path reversal transformation in T„ and its cost (see [7] ) . Then we point out some 
one-to-one correspondences between combinatorial objects and structures which are relevant to 
the problem of computing the average cost of path reversal. Such one-to-one tools are used in 
Section 3 to compute this expected cost and its variance by means of corresponding probability 
generating functions. 

2.1 Path reversal 

Let T„ be a rooted n-node tree, or an ordered tree with n nodes, according to either [7], or [8, 
page 306] . A path reversal at a node x in T„ is performed by traversing the path from x to the 
tree root r and making x the parent (or pointer Last) of each node on the path other than x. 
Thus X becomes the new tree root. The cost of the reversal is the number of edges on the path 
reversed. Path reversal is a variant of the standard path compression algorithm for maintaining 
disjoint sets under union. 

The average cost of a path reversal performed on an initial ordered n-node tree r„ which 
consists of a root with n — 1 descendants (or children, as in [7]) is the expected number of edges 
on the paths reversed in T„ (see Figure 1). In words, it is the expected height of such reversed 
trees fiTn), provided that we let the height of a tree root be 1: viz. the height of a node x in T„ 
is thus defined as being the number of nodes on the path from the node x to the root r of T„. 
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Figure 1: Path reversal ipx^y- The Tj's denote the (left/right) subtrees oiT, 



It turns out that the average number of messages used in A is actually the expected cost of 
a path reversal performed on such initial ordered n-nodc trees T„ which consist of a root with 
n — 1 children. This is indeed the average number of changes of the variable Last which builds 
the dynamic data structure of path reversal used in algorithm A. 

2.2 Priority queues, tournament trees and permutations 

Whenever two combinatorial structures arc counted by the same number, there exist one-one 
mappings between the two structures. Explicit one-to-one correspondences between combina- 
torial representations provide coding and decoding algorithms between the stuctures. We now 
need the following definitions of some combinatorial structures which are closely connected with 
path reversal and involved in the computation of its cost. 

2.2.1 Definitions and notations 

(i) Let [n] be the set {1, 2, . . . ,n}. A permutation is a one-one mapping cr : [n] — > [n]; we write 

a & Sn, where Sn is the symmetric group over [n]. 

(ii) A binary tournament tree of size n is a binary n-node tree whose internal nodes arc labeled 

with consecutive integers of [n] , in such a way that the root is labeled 1 , and all labels are 
decreasing (bottom-up) along each branch. Let 7^ denote the set of all binary tournament 
trees of size n. %-i also denotes the set of toiirnament representations of all permutations 
a ^ Sm considered as elements of [n]", since the correspondence r : S'„ ^ 7^ is one-one 
(see [16] for a detailed proof). Note that this one-to-one mapping implies that |7^| = n! 

(in) A priority queue of size n is a set Qn of keys ; each key K G Qn has an associated 
priority p(K) which is an arbitrary integer. To avoid cumbersome notations, we identify 
Qn with the set of priorities of its keys. Strictly speaking, this is a set with repetitions since 
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priorities need not be all distincts. However, it is convenient to ignore this technicality and 
assume distinct priorities. The simplest representation of a priority queue of size n is then 
a sequence s = {pi,P2, ■ ■ ■ ,Pn) of the priorities of Qn, kept in their order of arrival. 
Assume the nl possible orders of arrival of the p^'s to be equally likely, a priority queue 
Qn {i-e. a sequence s of pi's) is defined as random iff it is associated to a random order of 
the Pi's. There is a one-to-one correspondence between the set 7^ of all the n-node binary 
tournament trees and the set of all the priority queues Qn of size n. To each one sequence 
of priorities s = {pi, . . . ,pn) G Qn, we associate a binary tournament tree 7(5) = T e 7^ 
by the following rules: let m = min(s), we then write s = £ mr; the binary tree T G Tn 
possesses m as root, 7(f) as left subtree and 7(r) as right subtree. The rules are applied 
repeatedly to all the left and right subsequences of s, and from the root of T to the leaves 
of T; by convention, we let 7(0) = A (where A denotes the empty binary tree). The 
correspondence 7 is obviously one-one (see [6] for a fully detailed constructive proof). 

We shall thus use binary tournaments 7^ to represent the permutations of Sn as well as 
the priority queues Qn of size n. 

(iv) If T e is a binary tournament, its right branch RB{T) is the increasing scqiience of 
priorities found on the path starting at the root of T and repeatedly going to the right 
subtree. The bottom of RB{T) is the node having no right son. The left branch LB{T) of 
T is defined in a symmetrical manner. 

2.3 The one-one correspondence between Qn and T„ 

We now give a constuctive proof of a one-to-one correspondence mapping the given combinatorial 
structure of ordered trees T„ (as defined in the Introduction) onto the priority queues Qn- 

Theorem 2.1 There is a one-to-one correspondence between the priority queues of size n,Qn, 

and the ordered n-node trees Tn which consist of a root with n — i children. 

Proof There are many representations of priority queues Qn ; let us consider the n-node binary 
heap structure, which is very simple and perfectly suitable for the constructive proof. 

• First, a binary heap of size n is an essentially complete binary tree. A binary tree is 
essentially complete if each of its internal nodes possesses exactly two children, with the 
possible exception of a unique special node situated on level {h — 1) (where h denotes the 
height of the heap), which may possess only a left child and no right child. Moreover, all 
the leaves arc cither on level h, or else they are on levels h and (ft- — 1), and no leaf is found 
on level {h — 1) to the left of an internal node at the same level. The unique special node, 
if it exists, is to the right of all the other level {h — 1) internal nodes in the subtree. 
Besides, each tree node in a binary heap contains one item, with the items arranged in 
heap order [i.e. the priority queue ordering): the key of the item in the parent node 
is strictly smaller than the key of the item in any descendant's node. Thus the root is 
located at position 1 and contains an item of minimum key. If we number the nodes of 
such a essentially complete binary tree from 1 to n in heap order and identify nodes with 
numbers, the parent of the node located at position x is located at \x/2\ . Similarly, The left 
son of node x is located at 2x and its right son at min{2x -I- 1, n}. We can thus represent 
each node by an integer and the entire binary heap by a map from [n] onto the items: 
the binary heap with n nodes fits well into locations 1, . . . , n. This forces a breadth-first, 
left- to-right filling of the binary tree, i.e. a heap or priority queue ordering. 
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• Next, it is well-known that any ordered tree with n nodes may easily be transformed into 
a binary tree by the natural correspondence between ordered trees and binary trees. The 
corresponding binary tree is obtained by linking together the brothering nodes of the given 
ordered tree and removing vertical links except from a father to its first (left) son. 
Conversely, it is easy to see that any binary tree may be represented as an ordered tree by 
reversing the process. The correspondence is thus one-one (see [8, Vol. 1, page 333]). 

Note that the construction of a binary heap of size n can be carried out in a linear time, and 
more precisely in 0(n) sift- up operations. 

Now, to each one sequence of priorities s = (pi, . . . g Qn, we may associate a unique 
n-node tree a{s) = Tn in the natural breadth- first, left-to-right order; by convention, we also let 
a(0) = A. In such a representation, T„ = a{s) is then an ordered n-nodc tree the ordering of 
which is the priority queue (or heap) order, and it is thus built as an essentially complete binary 
heap of size n. The correspondence a naturally represents the priority queues Q„ of size n as 
ordered trees T„ with n nodes. 

Conversely, to any ordered tree T„ with n nodes, we may associate a binary tree with heap 
ordered nodes, that is an essentially complete binary heap. Hence, there exists a correspondence 
mapping any given ordered n-node tree T„ onto a unique sequence of priorities s = (3{Tn) € Qn', 
by convention we again let /3(A) = 0. 

The correspondence is one-one, and it is easily seen that mappings a and /? are respective 
inverses. □ 

Let binary tournament trees represent each one of the above structures. Any operation 
can thus be performed as if dealing with ordered trees r„, whereas binary tournament trees 
or permutations are really manipulated. More precisely, since we know that r„ < — > Qn < — > 
Tn < — > Sn, the cost of path reversal performed on initial n-node trees T„ which consist of a 
root with n — 1 children is transported from the T„'s onto the tournament trees T GTn and onto 
the permutations a G Sn- In the following definitions (see Section 3.1 below), we therefore let 
ip{(j) G Sn denote the "reversed" permutation which corresponds to the reversed tree r„. From 
this point the first moment of the cost of path reversal, ip : Tn ^ Tn, can be derived, and a 
straightforward proof technique of the result, distinct from the one in section 3 below, is also 
detailed in the Appendix. 

3 Expected cost of path reversal, average message com- 
plexity of A. 

It is fully detailed in the Introduction how the two data structures at hand are actually involved 
in algorithm A and the design of the algorithm takes place in [12, 15]. 

3.1 Analysis 

Eq. (13) proved in the Appendix, is actually sufficient to provide the average cost of path reversal. 
However, since we also desire to know the second moment of the cost, we do need the probability 
generating function of the probabilities Pn,k, defined as follows. 

Let h{Tn) denote the height of r„, i.e. the number of nodes on the path from the deepest 
node in T„ to the root of T„, and let T G 7^_i. 

Pn,k = Pr{cost of path reversal for T„ is k} = Pr{/i,((/?(T)) = A;} 
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is the probability that the tournament tree ip{T) is of height k. We also have 



Pn,k = Pr{fc changes occur in the variable Last of algorithnu4}. 

More precisely, let a swap be any interchanged pair of adjacent prime cycles (see [8, Vol. 3, 
pages 28-30]) in a permutation cr of [n — 1] to obtain the "reversed" permutation (pxi<7) corre- 
sponding to the path reversal performed at a node x € Tn, that is any interchange which occurs 
in the relative order of the elements of v'x(c) from the one of <t's elements, and let N be the 
number of these swaps occurring from a G Sn-i to fxi^'), then, 

Pnk = 7 — — TTT (number of cr G Sn-i for which N = k), 
[n — Ij! 

since the cost of a path reversal at the root of an ordered tree such as r„ is zero. 

Lemma 3.1 Let P„(z) = J2k>oP'n-,kz'' be the probability generating function of the Pn,k 's- We 
have the following identity, 

n-l . 

Proof We have pi,o = 1 and pi^k = for all A; > 0. 

A fundamental point in this derivation is that we are averaging not over all tournament trees 
T e but over all possible orders of the elements of 5„_i. Thus, every permutation of 

(n — 1) elements with k swaps corresponds to (n — 2) permutations of (n — 2) elements with k 
swaps and one permutation of (n — 2) elements with (fc — 1) swaps. This leads directly to the 
recurrence 

{n-l)\pn,k = {n-2){n-2)\pn-i,k + (n - 2)!p„_i,fe_i, 

or 

1 \ /I 



Pn,k= y-~ Pn-l,k + y^^^ZTlj P'^-'^^^-'^- (-^^ 

Consider any permutation a = (cti . . . (T„_i) of [n— 1]. Formula (1) can also be derived directly 
with the argument that the probability of N being equal to k is the simultaneous occurrence of 
= 3 ^ i-ij ^ n — 1) and N being equal to fc — 1 for the remaining elements of a, plus the 
simultaneous occurrence of Ci ^ j {1 < i, j < n — 1) and A'' being equal to k for the remaining 
elements of a. Therefore, 

Pn,k = Pr{o-i = j} X Pn-l,k-l + Pr{(Ji ^ j} X Pn-\,k 
= l))pn-l,fe-l + (l -!/(«■- l))pn-l,fe- 

Using now the probability generating function Pn{z) = '^k>oPn,kz'', we get after multiply- 
ing (1) by z'^ and summing, 

{n-l)Pn{z) = zPn-iiz) + (n-2)P„_i(^), 

which yields 

Pn{z) = '-^^Pn-Az) 

Pi{z) = z. (2) 
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The latter recurrence (2) telescopes immediately to 

Tl-l . ^ 

p„w ^ n 

□ 

Remark The property proved by Trehcl that the average number of messages required by A is 
exactly the number of nodes at height 2 in the reversed ordered trees (p{Tn) (see [12]) is hidden 
in the definition of the Pri,fe's. As a matter of fact, the number of permutations of [n] wliich 

n 



contains exactly 2 prime cycles is 



(n — l)!ff„_i (see [8]), and whence the result. 



Theorem 3.1 The expected cost of path reversal and the a,verage message complexity of algo- 
rithm A is E(C„) = C„ = Hn-i, with variance var{Cn) = Hn-i — Asymptotically, 
for large n, 

= Inn + 7 + 0{n^^) and var{Cn) = Inn + 7 - ttVo + 0{;tr^). 

Proof By Lemma 3.1, the probability generating function Pn{z) may be regarded as the product 
of a number of very simple probability generating functions (P.G.F.s), namely, for 1 < j < n — 1, 

Pn{z)= Yl Ujiz), with n,(z) = + 4. 

l<j<n-l 

Therefore, we need only compute moments for the P.G.F. 11^(2;), and then sum for j = 1 to 
n — 1. This is a classical property of P.G.F.s that one may transform products to sums. 

Now, n^.(l) = l/iandn^^(l) = 0, and hence 

E(c„) = cv: = f;(i) = ^n;(i) = 

Moreover, the variance of C„ is 

w(c„) = p;'(i) + p;(i) - i^2(i), 

and thus, 

n — 1 - n— 1 ^ 

j=i ■' j=i ■' 

Since -ff^^^ = 7r^/6 — 1/n + 0(n~^) when n — > +00, and by the asymptotic expansion 
of Hn, the asymptotic values of C„ and of var{Cn) are easily obtained. (Recall that Euler's 
constant \sj_ = 0.57721 . . ., thus 7 - 7rV6 = -1.6772 . . .) 

Hence, C„ = .693. ..Ign + 0(1), and war (C„) = .693. ..Ign + 0(1). □ 

Note also that, by a generalization of the central limit theorem to sums of independent but 
nonidentical random variables, it follows that 



(Inn - 1.06772 ...) 1/2 
converges to the normal distribution whe n — > +00. 
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Proposition 3.1 The worst-case message complexity of algorithm A is 0{n). 



Proof Let A be the m,aximum, communication delay time in the network and let S be the 
minimum delay time for a process to enter, proceed and release the critical section. 
Set q = [A/S] , the number of messages used in A is at most (n— 1) + (n— l)q = (n— + 1) = 
0(n). □ 

Remarks 

1. The one-to-one correspondence between ordered trees with {n + 1) nodes and the words 
of lenght 2n in the Dycklanguage with one type of bracket is used in [12] to compute the 
average message complexity of A. Several properties and results connecting the depth of a 
Dyckword and the height of the ordered n-node trees can be derived from the one-to-one 
correspondences between combinatorial structures involved in the proof of Theorem 2.1. 

2. In the first variant of algorithm A (see [15]) which is analysed here, a node never stores 
more than one request of some other node and hence it only requires O(logn) bits to store 
the variables, and the message size is also O(logn) bits. This is not true of the second 
variant of algorithm A (designed in [11]). Though the constant factor within the order of 
magnitude of the average number of messages is claimed to be slightly improved (from 1 
downto .4), the token now consists of a queue of processes requesting the critical section. 
Since at most n — 1 processes belong to the requesting queue, the size of the token is 
0(n log n). Therefore, whereas the average message complexity is slightly improved (up to 
a constant factor), the message size increases from O(logn) bits to O(nlogn) bits. The 
bit complexity is thus much larger in the second variant [11] of A. Moreover, the state 
information stored at each node is also 0(n log n) bits in the second variant, which again 
is much larger than in the first variant of A. 



4 Waiting time and average waiting time of algorithm A. 

Algorithm A is designed with the notion of token. Recall that a node can enter its critical section 
only if it has the token. However, unlike the concept of a token circulating continuously in the 
system, it is sent from one node to another if and only if a request is made for it. The token thus 
circulates strictly according to the order in which the requests have been made. The queue is 
updated by each node after executing its own critical section. The queue of requesting processes 
is maintained at the node containing the token and is transferred along with the token whenever 
the token is transferred. The requesting nodes receive the token strictly according to the order 
in the queue. 

In order to simplify the analysis, the following is assumed. 

• When a node is not in the critical section or is not already in the waiting queue, it generates 
a request for the token at Poisson rate A, i.e. the arrival process is a Poisson process. 

• Each node spends a constant time {a time imits) in the critical section, i.e. the rate of 
service is /x = 1/a. Suppose we would not assume a constant time spent by each process 
in the critical section, a could then be regarded as the maximum time spent in the critical 
section, since any node executes its critical section within a finite time. 

• The time for any message to travel from one node to any other node in the complete network 
is constant and is equal to S (communication delay). Since the message delay is finite, we 
assume here that every message originated at any node is delivered to its destination in a 
bounded amount of time: in words, the network is assumed to be synchronous 
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At any instant of time, a node has to be in one of the following two states: 

1. Critical state. 

The node is waiting in the queue for the token or is executing its critical section. In this 
state, it cannot generate any request for the token, and thus the rate of generation of 
request for entering the critical section by this node is zero. 

2. Noncritical state. 

The node is not waiting for the token and is not executing the critical section. In this state, 
this node generates a request for the token at Poisson rate A. 

4.1 Waiting time of algorithm A. 

Let Sk denote the system state when exactly k nodes are in the waiting queue, including the 
one in the critical section, and let Pfe denote the probability that the system is in state Sk, 
< k < n. In this state, only the remaining n — k nodes can generate a request. Thus, the net 
rate of request generation in such a situation is (n — k)X. Now the service rate is constant at 
as long as k is positive, i.e. as long as at least one node is there to execute its critical section 
and the service rate is when k = 0. The probability that during the time period {t, t + h) more 
than one change of state occur at any node is o{h). By using a simple birth-and-death process 
(see Feller, [4]), the following theorem is obtained. 

Theorem 4.1 When there are k (fc > 0) nodes in the queue, the waiting time of a node for the 
token is Wk = (fc — l)(cr + S) + a/2, where a denotes the time to execute the critical section 
and 6 the communication delay. 

The worst-case waiting time is Wworst < {n— l)(cr + 5) + cr/2 = 0{n). 
The exact expected waiting time of the algorithm is 



w = {cj + 5){n-nPn) - [5 + ct/2){1 - Pq - P^) + 25Pq 



where n denotes the average number of nodes in the queue and the critical section. 



Proof Following equations hold. 




(3) 



Under steady state equilibrium, Pq 



{t) = and hence. 



/iPi(i) - n\Po{t) = 0, and Pi (t) = n(A//i)Po(t). 



(4) 



Similarly, for any k such that 1 < A; < n, one can write 
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and 

^{Pkit + h) - Pk{t)) = -{n-k)XPkit) + (n-fc + l)APfe_i(i) + nPk+i{t) - fiPkit). 

Classically, set p = Xf/j.. Proceeding similarly, since under steady state equilibrium -P^(f) = 0, 
Pk+i{t) = (1 + (n - k)p) Pk{t) - {n-k + l)p, (5) 
and, for any k {1 < k < n), 

= (^^-"'^"^^^ ^ nVPo(i)- (6) 

For notational brevity, let Pk denote Pk{t). By Eq. 6, we can now compute the average 
number of nodes in the queue and the critical section under the form 



j2kPk = PoE^^V, 

k=0 k=0 



since the system will always be in one of the (n + 1) states >So, . . . ,<S„, Pk = 1. 
Now using expressions of Pk in terms of Pq yields 

n n 

;c=o k=o 

Thus, 



Let there be k nodes in the system when a node i generates a request. Then {k — 1) nodes 
execute their critical section, and one node executes the remaining part of its critical section 
before i gets the token. Thus, when there are k {k > 0) nodes in the queue, the waiting time of 
a node for the token is 

Wk = {k — 1) {time to execute the critical section communication delay + 
+ average remaining execution time), or 

Wk = (k-l){cr + S) + a/2 (8) 

(where a denotes the time to execute the critical section and 6 the communication delay). 

When there are zero node in the queue, the waiting time is wq = 26 = total communication 
delay of one request and one token message. Hence the expected waiting time is 

n—l n— 1 

EH = w = Y^^kPk = 2SPo + ^{(fc-l)(a + (5) + |}Pfe, 

fe=0 fe=l 

n—l n—l n—l 

= (a + 5)^fcPfe - {a + d)J2Pk + <y/2j2Pk + 2SPo, 

k=l k=l fe=l 

and 

w - {a + S){n - nPr,) - {6 + a/2){l - Po - Pn) + 25P^. (9) 
By Eq. (9), we know the exact value of the expected waiting time of a node for the token in 
algorithm^. Moreover, by Eq. (8), the worst-case waiting time is tw^orst < (n—l) (a + 5) + 
a/2 = 0{n). □ 
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4.2 Average waiting time of algorithm A. 

The above computation of W yields the asymptotic form of an upper bound on the average 
waiting time w of A. 

Theorem 4.2 If p < I, the average waiting time of a node for the token in A is asymptotically 
(when n — > +00^, 

(—1 
((7/2 + 35)6^1/"^ 

Proof Assume p = X/n < 1, when n is large we can bound W from above in Eq. (9) as follows, 

First, 1 - Po - P„ = 1 - Po - n!p"Po, where Pq = (Er=o"V)"'- 
Since p < 1, 

and, for large n, 

1 - Po - P„ < 1 - < 1 - - e-VP. (10) 



Next, 



P„ = n!p"Po > = e-VP 



and 



Now, 



n — 



nPn = Po^in^p' - nPn. 



i=l 



= n - 



< n 



n 1 



e^/'' n\pe^/p 



Therefore, 

n - nP„ < n - ne^i/'' - p-ie-i/^'/n! - ne-^/'' < n(l - 26^^/'') - p-^e~^/f/n\ (11) 
Thus, by Eqs (9), (10) and (11), we have 

which yields the desired upper bound for p < 1. When n is largo, 

w < (cT + (5) (n(l - 26-1/")) - {6 + cT/2){l-e-^/p) + O + 3S) e'^/"^ 



□ 
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Corollary 4.1 If p < 1, and irrespective of the values of a and 6, the worst-case waiting time 
of a node for the token in A is 0{n) when n is large. 



5 Randomized bounds for the message complexity of A. 
in arbitrary networks 

The network considered now is general. The exact cost of a path reversal in a rooted tree T„ is 
therefore much more difficult to compute. The message complexity of the variant A' of algorithm 
A for arbitrary networks is of course modified likewise. 

Let G = {V, E) denote the underlying graph of an arbitrary network, and let d{x^ y) the 
distance between two given vertices x and y of V. The diameter of the graph G is defined as 

D = max d{x,y). 

x,yeV 

Lemma 5.1 Let \V\ = n he the number of nodes in G. The number of messages Mn used in the 
algorithm A' for arbitrary networks of size n is such that < M„ < 2D. 

Proof Each request for critical section is at least satisfied by sending zero messages (whenever 
the request is made by the root), up to at most 2D messages, whenever the whole network must 
be traversed by the request message. □ 

In order to bound D, we make use of some results of Bela BoUobas about random graphs [2] 
which yield the following summary results. 



Summciry results 

1. Consider very sparse networks, e.g. for which the underlying graph G is just hardly con- 
nected. For almost every such network, the worst-case message complexity of the algorithm 
is O ( ^ ) . 

2. For almost every sparse networks (e.g. such that M = 0(n/2)), the worst-case message 
complexity of the algorithm is Q(\ogn). 

3. For almost every r-regular network, the worst-case message complexity of the algorithm is 
O(logn). 



6 Conclusion and open problems 

The waiting time of algorithm A is of course highly dependent on the values of many system 
parameters such as the service time a and the communication delay S. Yet, the performance of A, 
analysed in terms of average and worst-case complexity measures (time and message complexity), 
is quite comparable with the best existing distributed algorithms for mutual exclusion {e.g. [1, 
13]). However, algorithm A is only designed for complete networks and is a priori not fault 
tolerant, although a fault tolerant version of A could easily be designed. The use of direct 
one-one correspondences between combinatorial structures and associated probability generating 
functions proves here a powerful tool to derive the expected and the second moment of the cost of 
path reversal; such combinatorial and analytic methods are more and more required to complete 
average-case analyses of distributed algorithms and data structures [5] . From such a point of view, 
the full analysis of algorithm A completed herein is quite general. Nevertheless, the simulation 
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results obtained in the experimental tests performed in [15] also show a very good agreement 
with the average-case complexity value computed in the present paper. 

Let C be the class of distributed tree-based algorithms for mutual exclusion, em e.g. [1, 13]. 
There still remain open questions about A. In particular, is the algorithm A average-case optimal 
in the class C ? By the tight upper bound derived in [7], we know that the amortized cost of 
path reversal is O(logn). It is therefore likely that the average complexity of A is O(logn), 
and whence that algorithm A is average-case optimal in its class C. The same argument can be 
derived from the fact that the average height of n-node binary search trees is 6(logn) [5]. 
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Appendix 

In Subsection 2.1, we developed the tools which make it possible to derive the first two moments 
of the cost of path reversal, viz. the expected cost and the variance of the cost (see Subsection 
2.2). However, by Theorem 2.1, the average cost of path reversal may be directly proved to be 

Hn-i by solving a straight and simple recurrent equation. 

Proposition 6.1 The expected cost of path reversal and the average message complexity of al- 
gorithm A evaluate to cost((^(T„)) = C„ = Hn-i (resp.). 

Proof Let Cfc be the average cost of path reversal performed on an initial ordered fc-node tree 

^ k < n. Cfc is the expected height of T^, or the average number of nodes on the path from 
the node x in at which the path reversal is performed to the root ofT^. 

We thus have Ci + 1 = average number of nodes on the path from x to the root in the tree 
Ti with one node (Ci + 1 = 1), C2 + 1 = average number of nodes on the path from x to the 
root in a tree T2 with two nodes (C2 + 1 = 2), etc. And therefore, the following identity holds 

= ^((CT+l) + (^+1) +•••+ (C^+1)) forfc = l,...,n. (12) 
We can rewrite this recurrence in two equivalent forms: 



kCk+i = k -\- Ci + C2 +■■■+ Ck {k>l) 
{k-l)C~k = (fc-l) + ^ + ^ +•••+ CfcIT (A;>2). 
Substracting these equations yields 



kCk+i = I + kCk {k> 2) and Ci = 0. 

Hence, 



Ck+i = Ck + 1/k, {k > 2) 
01 = 0, 



and the general formula Ck+i = Hk follows. Setting k = n finally gives the average cost of path 
reversal for ordered n-node trees, 

Cn = Hn-l. (13) 

□ 



Note that if we let \LB(T)\ = \RB{T)\ denote the mean length of a right or left branch of a 
tree T e T^-i, we also have (by Lemma 2.1) 



cost((p(r„)) = \LB{T)\, 



where \LB{T)\ is averaged over all the (n — 1)! binary tournament trees in 7^_i. 
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